Classifying complex documents - comparing bespoke solutions to large language models

December 12, 2023 · One min read

Sung Kim

TaxAgents Team Member

Author(s)

Glen Hopkins, Kristjan Kalm

Abstract

Here we search for the best automated classification approach for a set of complex legal documents. Our classification task is not trivial: our aim is to classify ca 30,000 public courthouse records from 12 states and 267 counties at two different levels using nine sub-categories. Specifically, we investigated whether a fine-tuned large language model (LLM) can achieve the accuracy of a bespoke custom-trained model, and what is the amount of fine-tuning necessary.

Links to paper

Link to arXiv: https://arxiv.org/abs/2312.07182
Link to pdf: https://arxiv.org/pdf/2312.07182.pdf

Author(s)​

Abstract​

Links to paper​

Author(s)

Abstract

Links to paper